*----------------------------------------------------------------------------------------------------------------------------------*

*Replication_DempseyIonescu_ProbitRegression.do contains codes for probit regression results in the paper shown in tables 3 and A.1, as well as figure 5.

*----------------------------------------------------------------------------------------------------------------------------------*

clear all
use "/data/***"

************************

*Variable generation

************************

/*
Generate regression variables

-uniqueid: The combination of bank RSSD ID and the unique reference number for a given account. The variable is derived from the following original Y-14M variables: 
	-bankid, line item 3
	-referencenumber, line item 1

*/	

gen uniqueid = bankid + "&" + referencenumber

/*
-fico_mean_2018: The mean adjusted FICO score for a given account across the sample period of 2018. The variable is derived from the following original Y-14M variables: 
	-originalcreditscoreprimaryborrower, line item 39
	-refreshedcreditscoreprimaryborrower, line item 40
	-originalcreditscorename, line item 109
	-refreshedcreditscorename, line item 110
	-bankid, line item 3
	
The final credit score, stored as fico_score, is based on various combos of credit score name version and bank id, as detailed below.

The mean of this value is then taken across all the 12 months in the sample
*/

gen fico_score = .
replace fico_score = int(49.9067 + 0.9161 * refreshedcreditscoreprimaryborrower) if bankid == *** & (refreshedcreditscoreprimaryborrower != . | refreshedcreditscoreprimaryborrower == 0) & refreshedcreditscorename == 3
replace fico_score = int(49.9067 + 0.9161 * originalcreditscoreprimaryborrower) if bankid == *** & originalcreditscorename == 3
replace fico_score = refreshedcreditscoreprimaryborrower if (refreshedcreditscoreprimaryborrower == . | refreshedcreditscoreprimaryborrower == 0) & refreshedcreditscorenameversion == 1
replace fico_score = originalcreditscoreprimaryborrower if originalcreditscorenameversion == 1

bysort uniqueid: egen mean_fico_2018 = mean(fico_score)
 
 /*
 - m: The month of the sample. Taken from the Y-14M variable periodid, line item 4
 */
 
gen m = month(periodid)

/*
-revolver: Taken as the complement of transactor.
-transactor: Defined using the Y-14M variables:
	-cycleendingbalance, line item 15
	-actualpaymentamount, line item 51
In each month, the difference between the lagged cycleendingbalance (the previous month) and the current actualpaymentamount is calculated. If that value is positive, then the account is considered revolving for that month, saved as is_revolver. A count is then taken for how many months in the sample (12 total) the account revolved, stored as the variable revolver_count. For the most simple calculation, transactors are defined only as accounts which never revolve in the 12 month period, and revolvers as all others. This process is done in earlier cleaning code, with the process replicated below, though care must be taken to do this process only within each specific account. 
 
-mean_revolved_amount: Simular to the revolver alculation, except in this case the actual difference between lagged cycleendingbalance and actualpaymentamount is recorded, and then the mean value over 2018 is taken for a given account.
*/

sort uniqueid periodid

gen is_revolver = .
replace is_revolver = 1 if cycleendingbalance[_n-1] - actualpaymentamount > 0
replace is_revolver = 0 if cycleendingbalance[_n-1] - actualpaymentamount <= 0

gen revolver_amount = cycleendingbalance[_n-1] - actualpaymentamount > 0

bysort uniqueid: egen revolver_count = sum(is_revolver)
gen transactor = 1
replace transactor = 0 if revolver_count > 0

bysort uniqueid: egen mean_revolved_amount = mean(revolver_amount)

gen revolver = 1
replace revolver = 0 if transactor == 1

/*
-one_def, two_defs, three_defs: Expanding defenitions of account default, using the following Y-14M variables:
	-cycleendclosedrevokedflag, line item 100
	-accountcycleenddelinquency, line item 86
	-bankruptcyflag, line item 69
	
Preliminary flags chargeoff and month4delinquent are first created based off previous variables, and then are used to create defenitions of default. Each definition is a subset of the subsequent defenition. These variables are pulled in 2019, one year after the original sample, so as to reflect outcomes in the following year, and then merged with 2018 data using unique id.
*/

gen chargeoff = 0
replace chargeoff = 1 if cycleendclosedrevokedflag == 4

gen month4delinquent = 0
replace month4delinquent == 1 if accountcycleenddelinquency >= 4

gen one_def = 0
replace one_def = 1 if bankruptcyflag >= 1
gen two_defs = 0
replace two_defs = 1 if (one_def == 1 | chargeoff == 1)
gen three_defs = 0
replace three_defs = 1 if (two_defs == 1 | month4delinquent == 1)

/*
-mean_borrower_income: The mean borrower income over 2018. Uses the Y-14M variable borrowerincome, line item 33, adjusted for inflation from the origination date (since it is origination income) into 2018 dollars to get borrowerincome_inf. The mean of this value is then taken over 2018.
*/

bysort uniqueid: egen mean_borrower_income = sum(borrowerincome_inf)

/*
-mult_flag: A flag indicating if customer has multiple relationships with a bank. Defined in two stages, first by establishing if a customer has mltiple banking relationships, or by seeing if they also have a secondary card relationship. Uses the following Y-14M variables:
	-multiplebankingrelationshipflag, line item 28
	-multiplecreditcardrelationshipflag, line item 29
*/

gen mult_rel_flag = 0
replace mult_rel_flag = 1 if (multiplebankingrelationshipflag != . & multiplebankingrelationshipflag != 0 & multiplebankingrelationshipflag != 9)
gen mult_flag = 0
replace mult_flag =1 if (mult_rel_flag == 1 | multiplecreditcardrelationshipflag == 1)


/*
-new_account: A flag indicating if the account is new in 2018 using the accounts origination date. Uses the following Y-14M variables:
	-accountoriginationdate, line item 26
*/

gen new_account = 0
replace new_account = 1 if year(accountoriginationdate) == 2018


************************

*Regressions

************************

cd "/data/***/output"

*Table 3 ----------------------------------------------------------------------------------------------------------------

*Each regression adds an additional regressor. In all cases the pseudo-R^2 is calculated as well as the marginal effects.

*Regression one: Just Fico

eststo fico_est: probit two_defs fico_mean_2018
predict fico_phat
gen fico_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store fico_m
estimates drop fico_est

*Regression two: Add in income

eststo income_est: probit two_defs fico_mean_2018 mean_borrower_income
predict income_phat
gen income_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store income_m
estimates drop income_est

*Regression three: Multirelation
eststo multi_est: probit two_defs fico_mean_2018 mean_borrower_income i.mult_flag
predict multi_phat
gen multi_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store multi_m
estimates drop multi_est

*Regression four: Revolver

eststo revolver_est: probit two_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver
predict revolver_phat
gen revolver_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store revolver_m
estimates drop revolver_est

*Regression five: New Account

eststo account_est: probit two_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver new_account
predict account_phat
gen account_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store account_m
estimates drop account_est

*Regression six: Time FE

eststo time_fe_est: probit two_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver new_account i.m
predict time_fe_phat
gen time_fe_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store time_fe_m
estimates drop time_fe_est

*Regression seven: Bank FE

eststo bank_fe_est: probit two_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver new_account i.m i.id_rssd
predict bank_fe_phat
gen bank_fe_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store bank_fe_m
estimates drop bank_fe_est

*Note that the predicted values from this regression, bank_fe_phat, are used in the OLS analysis and summary tables

*An output table is created which stores all regression coefficients and is stored as table_3.tex. Since Pseudo-R^2 values are used, those must be read directly from the data as fico_r, income_r, multi_r, revolver_r, account_r, time_fe_r, and bank_fe_r, for each regression specification respectively.

esttab fico_m income_m multi_m revolver_m account_m time_fe_m bank_fe_m using table_3.tex, numbers nobase pr2 title(Table 3) keep(fico_mean_2018 mean_borrower_income) indicate("Multiple Relationships = *.mult_flag" "Revolver Status = revolver" "New Account = new_account" "Time FE = *.m" "Bank FE = *.id_rssd" , labels("X")) replace

*Table A.1 --------------------------------------------------------------------------------------------------------------

*Each regression adds an additional regressor. In all cases the pseudo-R^2 is calculated as well as the marginal effects.

*Regression one: Just Fico

eststo fico_est: probit three_defs fico_mean_2018
predict fico_phat
gen fico_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store fico_m
estimates drop fico_est

*Regression two: Add in income

eststo income_est: probit three_defs fico_mean_2018 mean_borrower_income
predict income_phat
gen income_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store income_m
estimates drop income_est

*Regression three: Multirelation
eststo multi_est: probit three_defs fico_mean_2018 mean_borrower_income i.mult_flag
predict multi_phat
gen multi_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store multi_m
estimates drop multi_est

*Regression four: Revolver

eststo revolver_est: probit three_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver
predict revolver_phat
gen revolver_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store revolver_m
estimates drop revolver_est

*Regression five: New Account

eststo account_est: probit three_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver new_account
predict account_phat
gen account_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store account_m
estimates drop account_est

*Regression six: Time FE

eststo time_fe_est: probit three_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver new_account i.m
predict time_fe_phat
gen time_fe_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store time_fe_m
estimates drop time_fe_est

*Regression seven: Bank FE

eststo bank_fe_est: probit three_defs fico_mean_2018 mean_borrower_income i.mult_flag revolver new_account i.m i.id_rssd
predict bank_fe_three_phat
gen bank_fe_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store bank_fe_m
estimates drop bank_fe_est

*Note that the predicted values from this regression, bank_fe_three_phat, are used in the summary tables

*An output table is created which stores all regression coefficients and is stored as table_A1.tex. Since Pseudo-R^2 values are used, those must be read directly from the data as fico_r, income_r, multi_r, revolver_r, account_r, time_fe_r, and bank_fe_r, for each regression specification respectively.

esttab fico_m income_m multi_m revolver_m account_m time_fe_m bank_fe_m using table_A1.tex, numbers nobase pr2 title(Table A.1) keep(fico_mean_2018 mean_borrower_income) indicate("Multiple Relationships = *.mult_flag" "Revolver Status = revolver" "New Account = new_account" "Time FE = *.m" "Bank FE = *.id_rssd" , labels("X")) replace

*Figure 5 --------------------------------------------------------------------------------------------------------------

*This probit regression is an expansion of the previous, and uses two stages to calculate an ex post default probibility: First it runs a probit to predict the likelihood of revolving, then a second on the likelihood of default, aaking the ratio of the two as p_star, a combined measure how likely one is first to revolve, and then to actually default.

*A stricter version or revolving is used, having revolved for at least three months
drop revolver
gen revolver = 0
replace revolver = 1 if revolver_count >= 3

*The first regression looks to see the likelihood of being a revolver

eststo revolve_prob_est: probit revolver fico_mean_2018 mean_borrower_income i.mult_flag  i.new_account i.m i.id_rssd
*predict bank_fe_phat
predictnl revolve_prob = predict(pr), ci(lbr ubr)
gen revolve_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store revolve_m
estimates drop revolve_prob_est

*Revolver is then redefined to the original definition (revolving at least once)
drop revolver
gen revolver = 1
replace revolver = 0 if transactor == 1

*The second regression looks at the probibility of defaulting

eststo two_defs_est: probit two_defs fico_mean_2018 mean_borrower_income i.mult_flag new_account revolver i.m i.id_rssd
*predict bank_fe_phat
predictnl default_prob = predict(pr), ci(lb2 ub2)
gen two_defs_r = 1 - e(ll)/e(ll_0) 
margins, dydx(*) post
estimates store two_defs_m
estimates drop two_defs_est


*p_star is generated as the ratio of the two probibility measures
gen p_star = default_prob/revolve_prob
